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We consider mark-recapture-recovery (MRR) data of animals 
where the model parameters are a function of individual time- varying 
continuous covariates. For example, the survival probability of an in- 
dividual may be a function of condition, with weight used as a proxy 
for this underlying condition. For time-varying individual covariates, 
the covariate value is unobserved if the corresponding individual is 
unobserved, in which case the survival probability cannot be evalu- 
ated. For continuous-valued covariates, the corresponding likelihood 
can only be expressed in the form of an integral that is analytically 
intractable, and, to date, no maximum likelihood approach that uses 
all the information in the data has been developed. We accomplish 
this task by formulating the MRR setting in a state-space framework 
and considering an approximate likelihood approach which essentially 
discretises the range of covariate values, reducing the integral to a 
summation. Assuming a first-order Markov structure for the covari- 
ate values, the likelihood can be efficiently calculated and maximized 
using standard techniques for hidden Markov models. We initially as- 
sess the approach using simulated data before applying to real data 
relating to Soay sheep. 

1. Introduction. Mark-recapture- recovery (MRR) data are commonly 
collected on animal populations in order to gain some understanding of 
the underlying system. Data are collected by repeated surveyings of the 
population under study. In the initial survey all individuals that are observed 
are uniquely identified (via natural features or by applying some form of 
mark, such as a ring or tag) and released back into the population. At 
each subsequent survey all individuals observed are recorded, and those that 
have not previously been observed are again uniquely identified, before all 
are released back into the population. We assume that individuals can be 
observed alive or recovered dead in each survey. The resulting MRR data 
can be summarised as the observed encounter histories for each individual 
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observed within the population, detailing for each survey event whether 
an individual was observed alive and/or recovered dead. Conditioning on 
the initial capture time of each individual leads to the Cormack- Jolly-Seber 
(CJS) model (see Schwarz and Seber, 2000, for a review of these models). 
The corresponding MRR likelihood function of these data can be written as 
a function of survival, recapture and recovery probabilities. 

Recent research has focussed on linking environmental and individual 
covariates to demographic parameters, most notably the survival probabil- 
ities, in order to explain temporal and individual variability (Brooks et al., 
2000; Catchpole et al, 2000; Coulson et al, 2001; Pollock, 2002; King and 
Brooks, 2003; King et al, 2006; Gimenez et al, 2006; Catchpole et al, 2008; 
Schofield and Barker, 2011, to name but a few). We consider individual time- 
varying continuous covariates. These have traditionally been difficult to deal 
with due to the missing covariate values (if an individual is unobserved the 
corresponding covariate value is also unknown). The corresponding likeli- 
hood is expressible in the form of an integral, integrating over the set of 
missing covariate values, but is analytically intractable. One of the initial 
approaches to dealing with such covariates was to (coarsely) discretise the 
covariate space, essentially defining discrete covariate "states". Nichols et 
al. (1992) considered data relating to meadow voles {Microtus pennsylvan- 
icus) and categorised weight into four different categories. Such a discreti- 
sation reduces the model to the Arnason- Schwarz model (Brownie et al, 
1993; Schwarz et al, 1993). Transition probabilities between the covariate 
states are estimated within the optimisation of the likelihood (possibly with 
additional restrictions on the possible state transitions). With the coarse 
discretisation arbitrarily defined, this approach leads to a (potentially sig- 
nificant) loss of information. More recently, Bayesian approaches have been 
proposed (Bonner and Schwarz, 2006; King et al, 2008), and the corre- 
sponding model fitted using a data augmentation approach (Tanner and 
Wong, 1987). This involves treating the unobserved continuous covariate 
values as parameters (or auxiliary variables) that are essentially integrated 
out numerically within the associated Markov chain Monte Carlo (MCMC) 
algorithm. Alternatively, Catchpole et al. (2008) have proposed a conditional 
likelihood approach (often referred to as "trinomial approach"). By condi- 
tioning on only the observed covariate values, this approach results in a 
simple, closed-form likelihood expression. However, this involves discarding 
a proportion of the available data, leading to a decreased precision of the 
parameter estimates. See Bonner et al. (2010) for further discussion and a 
comparison of the Bayesian and trinomial approaches. 

For the considered type of MRR data, Bonner et al. (2010) state that 
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"except when few values are missing, the large number of integrals [...] will 
make it impossible to perform maximum likelihood estimation". We claim 
that this statement is not strictly true, and present a novel approach based 
on a hidden Markov-type formulation of the MRR setting. This formulation 
leads to a likelihood that is easy to compute and to maximize numerically. 
The underlying idea is to finely discretise the space of possible covariate 
values, which we achieve by a numerical integration of the likelihood func- 
tion. The numerical integration enables us to augment the resulting discrete 
space of covariate values with the state space of the survival process, lead- 
ing to a single, partially hidden Markov process for each observed encounter 
history. This approach essentially extends the previous coarse discretisation 
approach of Nichols et al. (1992) by considering a very fine discretisation 
of the covariate space, coupled with specifying structured transition proba- 
bility matrices defined using a covariate process model. The corresponding 
likelihood can be written in a closed, efficient, matrix product form that is 
characteristic of hidden Markov models (HMMs) (Zucchini and MacDonald, 
2009). The likelihood is only approximate due to the discretisation, but the 
approximation can be made arbitrarily accurate. Notably, model selection 
can be carried out using standard model selection techniques. 

We perform a simulation study to assess the performance of the HMM- 
based approach before applying the method to data relating to Soay sheep 
{Ovis aries). The Soay sheep on the uninhabitated island of Hirta in the St 
Kilda archipelago are a well-studied biological system (Clutton-Brock and 
Pemberton, 2004). Intensive annual surveys involve physical recaptures of 
individuals, tagging of lambs, visual resightings and searches for dead car- 
casses. A range of individual covariate data are recorded for each sheep. We 
focus on the weight recorded, collected (when possible) when an individual 
is physically recaptured. We consider data relating to only females (since 
males and females have different life strategies), tagged as lambs between 
1985-2008 and recaptured/recovered annually from 1986-2009. 

The manuscript is structured as follows. Section 2 introduces the HMM- 
type estimation method for the specific MRR setting under consideration. 
An extensive simulation study investigating the performance of the proposed 
method, including a comparison to the trinomial approach, is given in Sec- 
tion 3. In Section 4, we analyse MRR data collected on Soay sheep, where 
the time-varying covariate of interest corresponds to body mass. 

2. Hidden Markov- type formulation of the MRR setting. We 

initially develop the form of the (partially) hidden Markov model for stan- 
dard MRR data (i.e., without any covariate information, in Section 2.1), 
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before extending to include individual time- varying continuous covariate in- 
formation (in Section 2.2). 

2.1. Formulation in absence of covariate information. 

2.1.1. General model formulation and notation. MRR data are typically 
most easily expressed in the form of the capture history of each individual 
animal observed within the study. We initially consider the probability of an 
encounter history for a given individual. Let T denote the number of capture 
occasions, and g the first time that the individual is observed. The capture 
history for the given individual is denoted by (x g , . . . , xt), such that 

!1 if the individual is observed at time t; 
2 if the individual is recovered dead in the interval (t — l,t]; 
otherwise. 

The encounter history can be regarded as the combination of two distinct 
processes: an underlying survival process and an observation process, condi- 
tional on the survival state of an individual. Thus, MRR data can be mod- 
elled via a (discrete) state-space model (i.e., HMM), separating the under- 
lying state process (i.e., survival process) from the observation process (i.e., 
recapture/recovery processes). For further discussion we refer the reader to 
Gimenez et al. (2007), Schofield and Barker (2008), Royle (2008), King et 
al. (2009) and King (2012). We define the survival process, (s g , . . . , st), such 
that 

(1 if the individual is alive at time t; 
2 if the individual is dead at time t, but was alive at time t — 1; 
3 if the individual is dead at time t, and was dead at time t — 1. 

Note that here we explicitly distinguish between "recently dead" individuals 
(st = 2) and "long dead" individuals (st = 3), and assume that only recently 
dead individuals can be recovered dead at a given capture event. This is a 
standard assumption within MRR models, due to the decay of marks for 
identifying individuals once they have died (although see, e.g., Catchpole 
et al., 2001, where this assumption is not valid). 

The likelihood of the observed capture histories is a function of survival, 
recapture and recovery probabilities. In particular, we set 

(ftt = P(st+i = l\st = 1) (Survival probability), 
Pt = P{xt = 1 1 = 1) (Capture probability), 
Xt = P(xt = 2\st = 2) (Recovery probability). 
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We note that the survival process is only partially observed (i.e., it is par- 
tially hidden) . For a capture history that includes a dead recovery, the corre- 
sponding survival process is completely known following initial capture (i.e., 
if x T = 2 then st = 1 for t = g, . . . , r — 1, st = 2 for t = r and st = 3 for 
t = r + l,...,T). Similarly, if an individual is observed at the final capture 
event, then the associated survival process following initial capture is also 
fully known (i.e., if X? = 1 then st = 1 for t = g, ... ,T). However, for all 
other histories the survival process following the final capture of the individ- 
ual is unknown. For notational convenience, we let S = {t > g : st is known} 
denote the set of all occasions at which the survival state of the individual 
is known, and S c the corresponding complement, i.e., the set of occasions at 
which the survival state is unknown, following initial capture. 

2.1.2. The likelihood. Conditional on the initial capture, the likelihood 
for a single capture history can be written in the form 



T 



(2.1) 



C 



e e n /n**- 

rG5 c s t G{1,2,3} t=g+l 



i)f(x t \s t ) 



taking into account all possible survival histories for the animal, given its 
observed capture history. For notational simplicity, we use / as a general 
symbol for a probability mass function or a density function, possibly con- 
ditional, throughout the manuscript. For example, here 



f(st\s t -i) 
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otherwise. 





f(x t \s t ) 



Expression (2.1) represents an inefficient way of computing the likelihood, 
since some impossible state sequences are taken into account (such as, e.g., 
...,1,2,1,1,...) that have a zero contribution to the likelihood. Clearly, 
only possible state sequences need to be evaluated, but we retain the full 
summation for notational simplicity. 
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An alternative expression for the likelihood is available using matrix prod- 
ucts. In particular, at time t, we define the transition probability matrix 
associated with the transitions between different survival states by I\, such 
that 

( <k l-<k 
T t = 
\ 

Furthermore, let Q(xt) denote the diagonal matrix giving the state-dependent 
probabilities of observations at time t on the diagonal: 

diag(l-pt, 1- X t ,l) if ar t = 0; 
Q(x t ) = { diag(pt,0,0) ifz t = l; 

diag(0,At,0) if a;t = 2, 

where diag(. . .) denotes the diagonal matrix with given diagonal elements. 
The likelihood (2.1) can then be written in the HMM form 

C = 8 I [] IViQte)] 1 3 

(2.2) = <5r 9 Q(x 9+ i)r 9+1 Q(x 9+2 )-...-r T _ 1 Q(x T )l 3 , 

where I3 denotes a column vector of length 3 with each element equal to 1, 
and S = (1, 0, 0) denotes the row vector giving the probabilities of occupying 
the different survival states at the initial capture occasion. The likelihood 
(2.2) is that of a partially hidden Markov model, and one effectively sums 
only over the unknown states, rather than over all possible state sequences. 
We further note that in general for MRR data, the likelihood can be calcu- 
lated more efficiently using sufficient statistics, but we introduce this form 
of notation here for facilitating the extension to time- varying individual co- 
variates (with a focus on continuous- valued covariates). In an MRR setting, 
the HMM-type matrix product likelihood form has previously been given by 
Pradel (2005), who also discusses the general benefits of being able to apply 
the powerful HMM machinery. 

2.2. Formulation in the presence of continuous-valued covariates. 

2.2.1. General model formulation and notation. We extend the HMM 
framework to allow for the inclusion of individual-specific, continuous co- 
variate information that varies over time. For example, this may correspond 
to the condition of the individual (where proxies such as weight or parasitic 
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load may be used). We consider a single time- varying continuous covariate, 
such that the survival probabilities are a deterministic function of this co- 
variate. The extension of the method to multiple covariates is, in principle, 
straightforward, although technically challenging and accompanied by large 
scale increases in computational time (see Section 5 for further discussion). 

Notationally, for a given individual we let yt denote the value of the co- 
variate at time t, t = g, . . . , T, and y = {yt : t = g, . . . ,T} the set of all 
covariate values. For all t > r such that x T = 2 the value of yt (i.e., the 
covariate value following the observed death) is not defined. We note that 
usually one observes yt when xt = 1, but there may still be cases where an 
individual is observed alive, but no covariate value is recorded. This may 
occur for example due to a resighting rather than a recapture of the indi- 
vidual, or time constraints making it infeasible to obtain covariate values 
for all individuals observed. We let W = {t > g : yt is observed} denote 
the set of times for which the covariate is observed. The corresponding ob- 
served covariate values are denoted by yyy = {yt ■ t G W}. Similarly we let 
W c denote the complement, i.e., the set of times for which the covariate is 
unobserved, excluding times for which it is known the individual is not in 
the study (i.e., before initial capture or when known to be dead), so that 
W c = {t > g : yt is unobserved}\{i > g : t G 5, st = 2, 3}. Finally, we let 
the set of missing covariate values be denoted by yyv c = {yt ■ t G W c }. 

We consider models in which the survival probability depends on the 
covariate, and assume that the probability of survival from occasion t to 
t + 1 is determined by the value yt- Typically a logistic regression of survival 
probability on covariate value is considered, so that 



see, e.g., North and Morgan (1979) and Bonner et al. (2010). 

Following Bonner et al. (2010), we assume an underlying model for the 
change in covariate values over time, specified by some first-order Markov 
process, f(yt\yt-i), for t = g+1, . • • , T. We set the function value of f{yt\yt-i) 
to one for s f = 2,3 (i.e., when an individual is dead). The covariate value 
may not be recorded at the initial capture, in which case we also require an 
underlying distribution on the initial covariate values, described by a prob- 
ability density function fo (but see remarks at the end of Section 2.2.2). 
Typically a random walk-type model is assumed for the underlying covari- 
ate model. For example, Bonner and Schwarz (2006) and King et al. (2008) 
consider models along the lines of 
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with at varying over time, and extensions thereof to allow for additional mod- 
elling complexities such as age-dependence. However, fitting such models in- 
volves some complexities, due to the unobserved covariate values, which need 
to be integrated out in order to explicitly calculate the likelihood function 
of the data. We discuss this in further detail next, and propose a likelihood- 
based approach that exploits the HMM machinery. 

2.2.2. The likelihood. The likelihood of the capture history and observed 
covariate values of an individual, conditional on the initial capture event, 
can be written in the form 

(2-5) C= [ ...[ £ £ f (y g ) 

J J r65 c s T e{l,2,3} 
T 

x II f(. s t\ s t-i,yt-i)f(xt\s t )f(yt\yt-i)dywc- 

t=g+l 

In general, the necessary integration within this likelihood expression is an- 
alytically intractable. In a Bayesian approach, the missing covariate values 
are typically treated as auxiliary variables that are essentially integrated out 
within the MCMC algorithm (King et al, 2009). However, model selection 
is usually complex in terms of the estimation of the Bayes factors or pos- 
terior model probabilities (although see King et al, 2008, and King et al, 
2009, with regard to the use of the reversible jump (RJ)MCMC for covariate 
selection and age/time dependence of the demographic parameters) and the 
potential sensitivity of these on the prior specified on the model parameters 
(see King et al, 2009, for further discussion). 

We adopt a classical maximum likelihood approach here, where we closely 
approximate the multiple integral appearing in the likelihood using numer- 
ical integration, essentially finely discretising the space of covariate values. 
This approach gives an approximation to the likelihood which can be made 
arbitrarily accurate by increasing the fineness of the discretisation. In many 
MRR settings, the computational effort required to obtain a very close ap- 
proximation is very reasonable, since one can write the approximate like- 
lihood in an efficient HMM-type matrix product (as shown below). The 
suggested strategy for approximating the likelihood has previously been suc- 
cessfully applied in finance in order to estimate stochastic volatility models 
(see, e.g., Fridman and Harris, 1998), but has a much wider scope as pointed 
out by Langrock (2011). 

Mathematically, we define an "essential range" for the covariate values, 
and split this range into m intervals of equal length, where m is some large 
number (e.g., m = 100). Let the jth interval be denoted by Bj = [bj-±, bj), 
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j = 1, . ..,m. The essential range corresponds to a lower and upper bound 
for the possible covariate values, given by bo and b m , respectively. We let 
b*- denote a representative point in Bj . For large m the choice of this point 
only plays a very minor role, and throughout this manuscript we will simply 
use the interval midpoint. The likelihood (2.5) is then approximated by 

(2.6) 

EE E /o(%) /{s6W} / M*)dz) 

KeW c i K =lre5 c s T G{l,2,3} VJb i 9 -l J 

T p 

x n fistlst-uyt^y^-^yfistls^ub^y^-^^yfixtlst) 
t=g+l L 

x /( w G S it |y t _ 1 ) / {(*-i)sw,*6Wc }/ ( m G B it |6^ _ i ) J {(*-i)6W=,t 6W c } 

where I denotes the indicator function. The final two lines correspond to the 
likelihood contribution of the underlying model for the covariate process, and 



f(Vt £ Bj\z) = / f(y t \z)dy t 



Note this is essentially the same numerical integration strategy that has pre- 
viously been implemented by Langrock (2011) and Langrock et al. (2012); 
see the latter reference for more details. The major difference to those ap- 
proaches is that here we allow for some covariate values to be observed, 
and hence do not integrate over the observed covariate values. Also here 
there is the additional difficulty of a second level of missing values, given by 
those St with t £ S c , which need to be summed over. Alternative numerical 
procedures for evaluating the likelihood are discussed in Section 5. 

The likelihood (2.6) can be written in an efficient HMM-type matrix nota- 
tion. This makes maximum likelihood estimation feasible and has the general 
benefit that the well-developed HMM machinery becomes applicable. To do 
this, we essentially augment the "alive" survival state by dividing it into 
m distinct states, corresponding to "alive and with covariate value in Bj\ 
j = 1, . . . ,m. The complete state space of the (partially) hidden process - 
now giving survival state and covariate value - comprises these m states 
plus the "recent dead" (state m + 1) and the "long dead" (state m + 2) sur- 
vival states. To obtain the matrix product form of the likelihood, we extend 
the HMM form described in Section 2.1.2, allowing for the augmentation of 
the single alive state St = 1 to the set of m states. In particular we need 
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to extend the definitions of the (system process) matrix Tt and of the ob- 
servation matrix Qt, and to include an initial distribution for the covariate 
values (assuming that these are not always observed). First, we define the 
(m + 2) x (m + 2) system process matrix 



-,(m) 



/ *t(M) 

*t(m,l) 





* t (l,m) 1-E^=i*t(l,i) o\ 

* t (m,m) l-E™=i*tKi) 

1 

1/ 



where 
*t(i,i) = 



70t+i 


= l|st 


= l,y*)/(yt+i|yt) 


/Ot+i 


= l]st 


= l,6*)/(y t+1 |fet) 


< /(*t+i 


= l|«t 


= i,yt)f(yt+i e Sj|yt 


/(«t+i 


= 


= l,6?)/(y t+1 GB,|&? 




V 







if t, t + 1 G W, y t G Bi, y t+1 e By, 
ifteW c ,t + leW,y t+ ieBf, 
if t G + 1 G W c ,y t e 
if t, t + 1 G W c ; 
otherwise. 

Here ^t{hj) corresponds to the probability of the individual surviving from 
time t to time t + 1, with the covariate value changing from a given value in 
the interval Bi at time t (either the observed covariate value or the mid- 
point value) to some value in the interval Bj at time t + 1 (either the 
observed covariate value or any point within the interval). We note that 
this formulation is similar to the Arnason-Schwarz model, where the tran- 
sition probabilities are defined between discrete states. However, within our 
model specification the transition probabilities are of a more complex form, 
as they are determined via the underlying model specified on the covariate 
process (rather than estimated freely), and also as they depend on whether 
the (continuous) covariate value is observed or not. For example, the prob- 
ability f(yt+i £ Bj\b*) is determined by the model used for the covariate 
process. If the model given by (2.4) is considered, then this probability is 
given by 



bj - (b* + at) 



(7 



a 



where $ denotes the cumulative distribution function of the standard normal 
distribution. 

We now consider the matrix comprising the state-dependent observation 
probabilities, which is a diagonal matrix of dimension (m + 2) x (m + 2), 
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such that 

!diag(l - p t , . . . , 1 - pt, 1 - A t , 1) if x t = 0; 

diag(p t ,...,p t ,0,0) ifx t = l; 

diag(0,...,0,A t) ifx t = 2. 

Finally, one may need to model the initial distribution for the covariate value 
(since the initial value may not be observed) . In general the distribution will 
depend on the model assumed for the covariate proces. For a probability 
density function of initial covariate values given by fo, we define the row 
vector 5^ of length m + 2 such that 

\!bLh{z)dz if geW c ,i e {I,..., m}, 

[O otherwise. 

If all initial covariate values are observed and the initial covariate distri- 
bution itself is not of interest, then one can set 6^ = 1 for g E W, y g G 
Bi, which corresponds to conditioning the likelihood on the initial covari- 
ate value (with the advantage that less parameters have to be estimated). 
Putting all these components together, the matrix formulation of (2.6) is 

c = *M ( n r; m jQ(-)(x t ) ) l m+2 

\t=g+i J 
(2.7) = jWrHQH( Sff+1 )rHQH( Ij+2 ) ..... rgQ( m >(, T )i m+2 , 

i.e., the likelihood has exactly the same structure as in the case of absence 
of covariates (cf. Expression (2.2)). It should perhaps be emphasized here 
that although (2.7) has precisely the same structure as an HMM likelihood 
(and hence can easily be maximized numerically), it is not the likelihood of 
an HMM, since (for any given t) the rows of the matrix do not sum 

to one. This is because some of the covariate values are known, and also 
because we restrict the range of covariate values to some essential range. 

2.2.3. Inference. For multiple individuals, the likelihood is simply the 
product of likelihoods of type (2.7), corresponding to each encounter his- 
tory. It is then a routine matter to numerically maximize this joint likeli- 
hood with respect to the model parameters, subject to well-known technical 
issues arising in all optimization problems; see Chapter 3 in Zucchini and 
MacDonald (2009) for a detailed account of the particular issues that arise 
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in the case of HMMs. Approximate confidence intervals for the parameters 
can be obtained based on the inverse of the estimated information matrix, 
or, alternatively, using a parametric bootstrap. Model selection, including 
for the underlying covariate process model, can easily be carried out using 
model selection criteria such as the Akaike information criterion (AIC). 

The accuracy of the likelihood approximation increases with increasing m. 
The influence on the estimates can be checked by considering different values 
of m: if for some relatively large m a further increase does not change the 
likelihood value and/or the estimates, then this is a very strong indication 
that m is sufficiently large to ensure a very close approximation. From our 
experience we suggest using 30 — 100 intervals in the discretisation (cf. the 
simulation study in Langrock et al., 2012). 

We note that the computational expense is not only a function of m and 
T and of the proportion of missing covariates, but also of the pattern that 
the missing values occur in. Consecutive missing covariate values lead to the 
highest computational burden (since they imply that all entries of the cor- 
responding system process matrix associated with the underlying covariate 
process need to be calculated, a total of m 2 entries). If an unobserved covari- 
ate value is followed by an observed covariate value, then the corresponding 
system process matrix consists of only one column with non-zero entries 
(and likewise, if an observed covariate value is followed by an unobserved 
covariate value, then there is only one row with non-zero entries). Consec- 
utive observed covariate values are clearly least computationally intensive 
(the system process matrix then consist of only one non-zero element). 

3. Simulation study. In this section, we present the results of a simu- 
lation study for evaluating the performance of the HMM-based method. As a 
benchmark method we consider the trinomal method suggested by Catchpole 
et al. (2008), which currently appears to be the most popular classical infer- 
ence method for MRR models with continuous-valued covariates (Bonner et 
al., 2010). We considered four different simulation scenarios, using different 
values for the recapture and the recovery probabilities, respectively. Table 1 
gives the combinations of these parameters that were considered. The differ- 
ent scenarios represent, inter alia, different amounts of information on the 
survival states (the lower A and the lower p, the less information) and on the 
covariate values (the lower p, the less information), respectively. For each 
of the scenarios we conducted 500 simulation experiments, in each experi- 
ment considering simulated capture histories for N = 500 individuals, each 
of them observed on at most T = 10 occasions. For each individual the time 
of the initial capture occasion was chosen uniformly from {1, . . . , 9}. 
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Table 1 

Configurations of true recovery and recapture probabilities used in four different 

simulation scenarios. 

Scenario p A 

"1 095 0.95 

2 0.90 0.30 

3 0.30 0.90 

4 0.30 0.30 



In each scenario we used the same underlying process to generate the 
covariate values. More precisely, for each individual we generated the values 
of the covariate process using an autoregressive-type process of order 1 with 
a deterministic (sine-shaped) trend: 

y t - 25 = r](jjt-i - 25) + a t + ae t , 

where at = 7 sin(27rt/T) and et l ~ A/"(0, 1). In all scenarios we used the 
following values for the parameters determining the covariate process: r/ = 
0.6, a = 1.2 and 7 = 2. For the initial covariate distribution, associated 
with the first capture event, we used a normal with mean 15 and standard 
deviation 2. We assume a logistic link function for the survival probabilities 
regressed on the covariate values, with intercept Pq = —3 and slope Pi = 0.2. 
For this model the survival probability is 0.5 for Wt-x = 15 and greater than 
0.9 for Wt-i > 26. The parameter values were chosen roughly similar to 
those estimated in the application to Soay sheep MRR data given in Bonner 
et al. (2010). In particular, a typical covariate time series starts at around 
15 at the initial capture occasion, over the years approaches 25 and then 
fluctuates around that value. The deterministic trend at was included to 
enable us to conduct a simple check for robustness of our method to model 
misspecification (see below). 

We here focus on the estimation of the parameters Pq and /3i , and in each 
case give the following summary statistics: sample mean estimate, sample 
mean relative bias ((Pi— Pi) / Pi), 2.5 and 97.5% quantiles of the relative bias, 
sample mean width of the estimated 95% confidence intervals and coverage 
probability of the confidence intervals. Confidence intervals were obtained 
based on the estimated Hessian matrix. For the HMM-based method we 
considered three different covariate process models in the simulation exper- 
iments: 1) the correctly specified model (i.e., the one that was used for sim- 
ulating the data; denoted model HMM-C), 2) a slightly misspecified model 
which assumes a homogeneous AR(1) for the covariate process (i.e., one 
that neglects the deterministic sine-shaped component of the trend; model 

imsart-aoas ver. 2012/08/31 file: paper_AoAS.tex date: November 6, 2012 



14 



R. LANGROCK AND R. KING 



HMM-M1), and 3) a substantially misspecified model which assumes that 
at all ages (and across all individuals) the covariate is independently and 
identically normally distributed, with mean and standard error being esti- 
mated in the simulation experiments (model HMM-M2; this model neglects 
both trend components as well as the correlation over time). The latter 
two explore the robustness of our method to misspecification of the covari- 
ate process model. In the implementation of the HMM approach we used 
m = 40 intervals in the discretization of the covariate space, and the func- 
tion nlm in R to maximize the approximate likelihood numerically. In the 
implementation of the trinomial approach we used the function optim in R 
instead, since nlm had problems in estimating the Hessian when p or A are 
estimated at the boundaries of their support (which happens occasionally 
when using the trinomial method). Results are provided in Table 2. 

Table 2 

Sample mean estimates (ME), sample means (RB) and 2.5 and 97.5% quantiles of the 
relative biases, sample mean widths ( CW) of the estimated 95% confidence intervals and 
coverage probabilities (CC) of the confidence intervals, for the logistic regression 
parameters /So and fii , in four simulation scenarios. 



Sc Meth. 



ME 



intercept (/?o 

RB(go.025,?0.975 



-3) 
CW CC 



ME 



slope (/3i = 

RB(go.025,90.975 



2) 

CW 



CC 



Tri 

HMM-C 
HMM-M1 
HMM-M2 
Tri 

HMM-C 
HMM-M1 
HMM-M2 
Tri 

HMM-C 
HMM-M1 
HMM-M2 
Tri 

HMM-C 

HMM-M1 

HMM-M2 



-3.00 
-3.00 
-3.00 
-3.00 
-3.01 
-3.01 
-3.00 
-3.08 
-3.09 
-3.00 
-3.00 
-2.96 
-2.98 
-2.99 
-3.02 
-3.26 



0.00(- 
0.00(- 
0.00(- 

-0.01(- 
0.00(- 
0.00(- 
0.00(- 

-0.03(- 
0.03(- 
0.00(- 
0.00(- 
0.01(- 
0.00(- 
0.00(- 

-0.01(- 

-0.09(- 



0.23,0.22 
0.24,0.23 
0.24,0.22 
0.24,0.22 

0.28,0.26 
0.24,0.22 
0.24,0.21 
0.28,0.20 
0.52,0.34 
0.26,0.24 
0.27,0.26 
0.28,0.28 
0.58,0.60 
0.30,0.32 
0.34,0.33 
0.45,0.30 



1.39 0.96 

1.33 0.94 

1.34 0.94 
1.34 0.95 
1.69 0.95 

1.37 0.95 

1.38 0.96 
1.41 0.95 
3.08 0.97 
1.46 0.94 
1.50 0.93 
1.57 0.93 
3.73 0.98 
1.92 0.95 
2.01 0.95 
2.20 0.92 



0.20 
0.20 
0.20 
0.20 
0.20 
0.20 
0.20 
0.20 
0.20 
0.20 
0.20 
0.20 
0.20 
0.20 
0.20 
0.22 



0.001 
0.00( 
0.00( 
0.0K 
0.00( 
0.00( 
0.00( 
0.02( 
0.021 
0.00( 
0.0K 
-0.0K 
0.0K 
0.00( 
0.02( 
0.09( 



-0.20,0.20 
-0.19,0.21 
-0.18,0.21 
-0.19,0.21 
-0.27,0.30 
-0.18,0.20 
-0.18,0.20 
-0.17,0.23 
-0.26,0.35 
-0.21,0.22 
-0.21,0.24 
-0.23,0.23 
-0.45,0.57 
-0.26,0.25 
-0.26,0.31 
-0.23,0.40 



0.08 
0.07 
0.08 
0.08 
0.12 
0.08 
0.08 
0.08 
0.14 
0.08 
0.09 
0.09 
0.20 
0.11 
0.11 
0.13 



0.94 
0.93 
0.94 
0.94 
0.95 
0.95 
0.95 
0.94 
0.97 
0.95 
0.94 
0.94 
0.95 
0.95 
0.95 
0.92 



In all four simulation scenarios, the interval estimates obtained using the 
HMM-based method were narrower than those obtained using the trino- 
mial method, with the differences being substantial in Scenarios 3 and 4 
(those with low capture probabilities). Using the HMM-based method, with 
both the correct specification (HMM-C) and with a slight misspecification 
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(HMM-M1) of the model for the covariate process, no significant bias was 
found in the estimates of the logistic regression parameters (in each sce- 
nario). The experiment involving a substantial misspecification of the co- 
variate process model (HMM-M2) led to a 9% negative bias in Scenario 4, 
whereas in all other scenarios there still was only a small bias. In all consid- 
ered settings, coverage probabilities of the interval estimates were close to 
95%. We note that it is immediate to consider a model selection approach 
for the underlying covariate process, for example using the AIC statistic. For 
the present simulation experiment, the correct underlying covariate model 
(model HMM-C) was deemed optimal by the AIC statistic in all 500 sim- 
ulation runs (when compared to the models HMM-M1 and HMM-M2, re- 
spectively). 

4. Application to Soay sheep data. We consider capture histories 
for Soay sheep that were observed on the Island of Hirta, Scotland, between 
1985 and 2009. For details on the population dynamics of the Soay sheep 
we refer to Clutton-Brock and Pemberton (2004). We consider only female 
sheep that were born from 1985 onwards, leading to a total of 1344 individual 
capture histories. The mean number of observations per sheep was 4.64 
with a total of 900 sheep recovered dead during the observation period. 
We assume that the survival probabilities are a function of body mass. Not 
all observations are associated with the animal being physically captured, 
and thus for 38% of the observations the corresponding body mass was 
not recorded. Following Bonner et al. (2010), we consider four different age 
groups: lambs (age < 1), yearlings (age £ [1,2)), adults (age £ [2,7)) and 
seniors (age > 7). We assume a logistic relationship between the covariate 
weight and the survival probability, so that 

logit(^) = Pa t ,0 + Pa t ,Wt- 

For a given sheep, at indicates the age group the sheep is in at time t (lamb, 
yearling, adult or senior). We consider five different possible models in total, 
summarised as follows: 

Model 1: y t = yt-i + rj at (n at —yt-i) + cr at £t, time-dependent recapture 
probabilities, time-dependent recovery probabilities (68 parameters); 

Model 2: y t = yt-\ + f] (fJ- — yt-i) + o^t (i-e., covariate process parameters 
fixed across age groups), time-dependent recapture probabilities, time- 
dependent recovery probabilities (59 parameters); 

Model 3: Same covariate model as for model 1, constant recapture proba- 
bility, constant recovery probability (22 parameters); 
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Model 4: Same covariate model as for model 1, constant recapture proba- 
bility, time-dependent recovery probability (45 parameters); 

Model 5: yt = yt-i + Ha t + a a t e t, time-dependent recapture probabilities, 
time-dependent recovery probabilities (64 parameters). 

For the covariate process models, €% denote independently and identically 
distributed standard normal random variables. Model 5 has a covariate pro- 
cess model similar to those used by Bonner and Schwarz (2006), King et 
al. (2008) and Bonner et al. (2010) (although, for example, Bonner et al., 
2010, assume /j, not only depends on the age group of the sheep but also 
on the year and King et al, 2008, consider a further additive year effect). 
Notably, this covariate process model is diffusive and thus, in general, not 
biologically realistic (see later discussion) . Besides illustrating the suggested 
HMM-based approach, the main objective of the given analysis is to demon- 
strate that it is straightforward to consider a model selection procedure on 
both the observation process model and the underlying covariate process 
model, rather than to perform a full investigation of the dependence struc- 
ture of the parameters and factors that affect the survival of the individuals. 

Each of the models was fitted using the HMM-based approach and m = 50 
intervals in the discretization, with the assumed essential range of covari- 
ate values given by b = 0.8b min and b m = 1.2b max , where b min and b max 
denote the minimum and the maximum of the observed covariate values, 
respectively. For the given data we have b m i n = 2.9 and b max = 33.9. For 
the initial covariate value we assumed a normal distribution, and estimated 
the corresponding mean and variance parameter alongside the other param- 
eters. The AIC values obtained for the five different models described above 
are provided in Table 3. Clearly, model 1 is identified as optimal via the AIC 
statistic by quite a substantial margin. 

Table 3 

Log-likelihood, number of parameters (p) and AIC values for different models, fitted to 

the Soay sheep data. 

log£ p AIC A AIC 

10221.85 70 20583.70 

10350.66 61 20823.32 239.62 

10309.28 24 20666.56 82.86 

10260.68 47 20615.36 31.66 

10404.45 66 20940.90 357.20 



model 1 - 

model 2 - 

model 3 - 

model 4 - 

model 5 - 



Figure 1 displays the estimated survival probabilities for model 1 for the 
different age groups, and in each case as a function of the covariate body 
mass. Pointwise confidence intervals were obtained via the delta method. 
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The results are similar to those of Bonner et al. (2010) based on the data 
from 1986-2000. The survival probability increases with increasing weight in 
each age group, with this effect found strongest for lambs and seniors and 
weakest for adults. 



lambs yearlings 




weight weight 

Fig 1 . Estimated survival probability as a function of the covariate body mass ( weight, in 
kg), for the four different age groups. Solid lines give the maximum likelihood estimates, 
and dashed lines indicate the 95% pointwise confidence intervals. 

In Figure 2, the observed body masses of sheep at ages 0-12 are compared 
to the model-derived distributions of body masses (of alive sheep) for these 
ages. We omitted models 3 and 4 since the covariate process model in these 
models is identical to that of model 1. Models 1 and 2 appear to capture 
the development of the body mass over the years. However, the diffusive na- 
ture of the covariate process in model 5 leads to increasingly wider interval 
estimates for body mass as age increases, with the intervals not capturing 
well the observed quantiles. Thus, as already identified via the AIC statis- 
tic, it appears that auto-regressive type covariate process models are more 
appropriate in this application. 
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j f t T j 









2 4 6 8 10 12 

age (in yrs.) 



model 2 




2 4 6 8 10 12 

age (in yrs.) 



model 5 




Fig 2. Observed body masses of sheep at ages 0-12 (tiny black dots), empirical 5% and 
95% quantiles (big grey dots) and empirical medians (big black dots) of body masses at 
those ages, and model-derived 5% and 95% quantiles (dashed grey lines) and medians 
(solid black lines) of body mass distributions of alive individuals at those ages (obtained 
through simulation); for fitted models 1, 2 and 5. 



5. Discussion. In recent years, several different methods have been 
proposed that address MRR studies that involve individual-specific and 
time- varying continuous covariates. The most popular approaches for fit- 
ting models to this type of data are the trinomial method (Catchpole et 
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al., 2008) and the Bayesian imputation method (Bonner and Schwarz, 2006; 
King et al, 2008, 2009; Schofield and Barker, 2011). The former method 
avoids assumptions concerning the underlying model for the covariate pro- 
cess. However, it disregards a potentially significant amount of information 
in the data, which can lead to poor precision of the parameter estimates. 
Use of the trinomial approach is not recommended if capture probabilities 
are low (Bonner et al, 2010) or clearly if the underlying covariate process is 
of interest in itself. The Bayesian approach makes use of all available infor- 
mation in the data and thus usually leads to an improved precision of the 
estimators (provided a correct specification of the covariate process model). 
However, model selection is generally more difficult and prior distributions 
need to be specified on the parameters in the model. 

The proposed HMM-based method for estimating such MRR models is 
based on an approximate expression for the likelihood which is obtained 
via numerical integration. The underlying idea is to discretize the space of 
covariate values, which reduces the multiple integral appearing in the like- 
lihood to a multiple sum. The resulting multiple sum can then efficiently 
be calculated by rewriting it as a matrix product which is characteristic of 
HMMs. While the fitting is based on maximizing only an approximation to 
the likelihood, it is very easy to make this approximation extremely accurate 
(by considering increasingly finer discretisations of the covariate space, and 
possibly a larger essential range), while maintaining computational tractabil- 
ity in typical MRR settings. 

The simulation study demonstrated that if the covariate process is mod- 
elled adequately, and even if the model is misspecified to some degree, then 
the HMM-based approach leads to more precise estimates than does the tri- 
nomial method. Statistical inference on the model parameters can be con- 
ducted either based on the Hessian or using a bootstrap. The method is 
easy to implement and to apply (R code is provided in the supplementary 
material Supplement A), and, once it is implemented, changes of the model 
structure usually only require very minor and straightforward changes in the 
code, making this method very user- friendly. It is also possible to formally 
(and simply) compare competing covariate process models using standard 
information criteria, such as the AIC statistic. Model checking of the covari- 
ate process model can be performed by comparing the observed covariate 
values with those obtained from the fitted process model, for example using 
graphical means to assess a lack of model fit. Model selection can be per- 
formed also on the observation process parameters, and simultaneously with 
the covariate process model. This is illustrated in the Soay sheep application 
in Section 4, where the AIC statistic was used to compare competing mod- 
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els. Previous covariate process models that have been suggested for these 
type of data (including the Soay sheep) are typically of the form of diffusive 
random walks. For this application, an alternative non-diffusive AR(l)-type 
model appears to provide a significantly better fit. However, the purpose of 
this analysis is not to identify the model that fits these data best, but to 
demonstrate the HMM-based model fitting approach. 

The HMM-based approach can be extended in different ways. The exten- 
sion to allow the observation model parameters to be dependent on the indi- 
vidual covariate is straightforward, with minimal additional computational 
effort - the only change that is required relates to the matrix comprising the 
state-dependent probabilities. A drawback of the HMM-based approach is 
that the computational effort increases dramatically if multiple continuous, 
individual-specific and time-varying covariates are considered. However, we 
anticipate that using more sophisticated numerical procedures in the likeli- 
hood approximation, such as, e.g., Gauss-Legendre, will at least render the 
case of two such covariates feasible even for (relatively) large MRR data 
sets. In general, it may also be worthwhile to consider alternative numerical 
approaches for evaluating the likelihood, such as, e.g., simulated maximum 
likelihood (which is often used in stochastic volatility modeling; see, e.g., 
Durbin and Koopman, 1997). Another extension that is straightforward in 
principle, but accompanied by large scale increases in computational time is 
that to models involving random effects (see, e.g., King et al, 2008, for an 
account in an MRR setting in a Bayesian framework, and Schliehe-Diecks 
et al., 2012, Langrock et al., 2012, for implementations of similar models in 
a non-Bayesian HMM framework in other ecological applications). 

SUPPLEMENTARY MATERIAL 

Supplement A: R code for model fitting 

(http://www.e-publications.org/ims/support/...). Sample R code for simu- 
lating MRR data and fitting the corresponding model using the HMM-based 
approach (with MRR model as described in Section 3). 
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